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We consider the problem of detecting whether or not, in a given 
sensor network, there is a cluster of sensors which exhibit an "unusual 
behavior." Formally, suppose we are given a set of nodes and attach 
a random variable to each node. We observe a realization of this 
process and want to decide between the following two hypotheses: 
under the null, the variables are i.i.d. standard normal; under the 
alternative, there is a cluster of variables that are i.i.d. normal with 
positive mean and unit variance, while the rest are i.i.d. standard 
normal. We also address surveillance settings where each sensor in the 
network collects information over time. The resulting model is similar, 
now with a time series attached to each node. We again observe the 
process over time and want to decide between the null, where all 
the variables are i.i.d. standard normal, and the alternative, where 
there is an emerging cluster of i.i.d. normal variables with positive 
mean and unit variance. The growth models used to represent the 
emerging cluster are quite general and, in particular, include cellular 
automata used in modeling epidemics. In both settings, we consider 
classes of clusters that are quite general, for which we obtain a lower 
bound on their respective minimax detection rate and show that some 
form of scan statistic, by far the most popular method in practice, 
achieves that same rate to within a logarithmic factor. Our results 
are not limited to the normal location model, but generalize to any 
one-parameter exponential family when the anomalous clusters are 
large enough. 

1. Introduction. We discuss the problem of detecting whether or not, 
in a given network, there is a cluster of nodes which exhibit an "unusual 
behavior." Suppose that we are given a set of nodes with a random variable 
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attached to each node. We observe a realization of this process and would 
like to tell whether all the variables at the nodes have the same behavior, in 
the sense that they are all sampled from a common distribution, or whether 
there is a cluster of nodes at which the variables have a different distribution. 

1.1. A wide array of applications. The task of detection in networks is 
critical for an increasing number of applications, for example, in surveillance 
and environment monitoring. We describe a few of these applications below. 

Detection in sensor networks. The advent of sensor networks [3, 20, 74] 
has multiplied the amount of data and the variety of applications where 
the task of detection is central. Surveillance and environment monitoring 
are prime areas of application for sensor networks. Take, for example, the 
transport of hazardous materials. Currently, some major traffic bottlenecks 
(e.g., airports, subways and borders) use portal monitoring systems [27, 
28]. Sensor networks offer a more flexible, decentralized alternative and are 
considered for the detection of radioactive, biological or chemical materials 
[15, 19, 35] . Sensor networks are also extensively used in other target tracking 
settings [11, 50]. 

Detection in digital signals and images. A digital camera may be seen 
as a sensor network, with CCD or CMOS pixel sensors. As imaging systems 
have been available for quite some time, the literature on detection in images 
is quite extensive, spanning several decades, particularly in satellite imagery 
[18, 29, 58, 65], computer vision [68, 75] and medical imaging [14, 39, 51, 53]. 

Disease outbreak detection. The presence of a biological or chemical ma- 
terial in a given geographical region may also be detected indirectly through 
its impact on human health. In this context, early detection of the disease 
outbreak is crucial in order to minimize the severity of the epidemic. For 
that purpose, some specific information networks are used, with surveillance 
systems now incorporating data from hospital emergency visits, ambulance 
dispatch calls and pharmacy sales of over-the-counter drugs [34, 60, 70]. 

Virus detection in a computer network. Diseases affect computers as 
well, in the form of viruses and worms spreading from host to host in a 
computer network [66]. Affected machines may exhibit slightly anomalous 
behavior (e.g., a loss of performance or violations of specific rules) which 
may be hard to detect on an individual machine. 

Detection from field measurements. In [54], the water quality in a net- 
work of streams in Pennsylvania is assessed by field biologists performing a 
variety of analyses at various locations along the streams; the objective is 
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to determine whether there are regions of low biological integrity based on 
the collected data, and to identify these regions. Other field measurements 
include census data and surveys involving geographical location. 

Detection is, of course, closely related to estimation (i.e., the localization 
or extraction of the anomalous cluster of nodes), but different. This distinc- 
tion is rarely made clear, however. Indeed, reliable detection is possible at 
lower signal-to-noise ratios than reliable estimation and it may be important 
to detect the presence of signals from noisy data without being able to esti- 
mate them. For example, one could imagine developing a surveillance system 
performing detection at relatively low energy /bandwidth costs, yet efficient 
at low signal-to- noise ratios, and then switching to estimation mode when- 
ever the presence of a signal is detected. Another example would be a low 
cost preliminary survey involving fewer field measurements, with findings 
subsequently confirmed by a larger, more expensive survey. 

1.2. Mathematical framework. 

1.2.1. Purely spatial model. We loosely model a network with a set of m 
nodes, denoted by V m . In our examples, we will either assume that V m is 
embedded in a Euclidean space or we will equip V m with a graph structure. 
Our analysis is in the setting of large networks, that is, m — > oo. To each node 
v € V m , we attach a random variable X v . The nodes represent the sources of 
information (e.g., sensors) and the variables represent the data they collect. 
In some settings, the data collected by each unit is multidimensional, in 
which case X v is a random vector. Our discussion readily generalizes to that 
setting. 

The random variables are assumed to be independent. For concreteness, 
we consider a normal location model, popular in signal and image processing, 
to model the noise. Our analysis, however, generalizes to any exponential 
family under some conditions on the sizes of the anomalous clusters, such as 
Bernoulli models which arise in sensor arrays where each sensor collects one 
bit (i.e., makes a binary decision) or Poisson models which come up with 
count data, for instance, arising in infectious disease surveillance systems 
[43]. The extension to exponential families is detailed in Section 4.1. 

The situation where no signal is present, that is, "business as usual," is 
modeled as 

H^:X,~AA(0,1) WeV m . 

Let K be a cluster, which we define for now as a subset of nodes, that is, 
K C V m . In fact, we will be interested in classes of clusters that are either 
derived from a geometric shape, when Y m is embedded in Euclidean space, 
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Fig. 1. Left: a thick cluster is defined as the nodes within a closed curve, which is a mild 
deformation of a circle. Right: corresponding noisy data. 



or connected components, when V m has a graph structure. The situation 
where the nodes in K behave anomalously is modeled as 

H™ x : X v ~ Af(fj, K , 1) W eK] X v ~ Af (0, 1) Vu $ K, 

where fix > 0. We choose to decompose jix as jjlk — \K\~ 1 / 2 Ak, where \K\ 
denotes the number of nodes in K and is the signal strength. Indeed, 
with this normalization, for any cluster if, 

(1.1) minP(T = l|Hg*) + P(T = 0|H^) = 2P(AT(0, 1) > A K /2), 

where the minimum is over all tests for versus and the lower bound 
is achieved by the likelihood ratio (Neyman-Pearson) test. We define 

A m = max Arc, A™ = min Apr. 
Figures 1-4 illustrate the setting for various types of clusters. 
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Fig. 2. Left: a thin cluster is defined as the nodes within a band around a given curve. 
Right: corresponding noisy data. 
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Let JC m be a class of clusters within V m and define 

nr= U Hfr. 

KeKm 

We are interested in testing H™ versus H™. In other words, under the alter- 
native, the cluster of anomalous nodes is only known to belong to K, m . We 
adopt a minimax point of view. For a test T, we define its worst-case risk as 

7K m (T) = P(T = 1W) + max P(T = 0|H^). 

K£K, m 

The minimax risk for versus H™ is defined as 

7/C m =inf7Ac m (T). 

We say that H™ and are asymptotically inseparable (in the minimax 
sense) if 

Mm lK m = 1, 

m— >oo 

which is equivalent to saying that, as m becomes large, no test can perform 
substantially better than random guessing, without even looking at the data. 
A sequence of tests (T m ) is said to asymptotically separate and H™ if 

lim 7/c m (r m ) = 0, 

m— >oo 

and and H™ are said to be asymptotically separable if there is such a 
sequence of tests. For example, in view of (1.1), for any sequence of clusters 
K m C V m , Hq 1 and H™^ m are asymptotically inseparable if A^- m — > and 
they are asymptotically separable if Ax m —> oo. For convenience, we assume 
that no cluster in the class tC m is of size comparable to that of the entire 
network, that is, max{|ET| : K G IC m } = o(m). This simplifies the statement 
of our results and detecting such clusters can easily be achieved using the 
test that rejects for large values of X^ev m X v . 

The situation we just described is purely spatial and relevant in some 
applications not involving time. Such situations are common in image pro- 
cessing. In other applications, especially in surveillance, time is an intrinsic 
part of the setting. In the following section, we modify the model above to 
incorporate time. 

1.2.2. Spatio-temporal model. Building on the framework introduced in 
the previous section, we assume that each X v is now a (discrete) time series, 
(X v (t),t G T m ), where T m C [0, oo) is finite with |T m | — > oo; let t m = max{£ G 
T m }. Let K m be a class of cluster sequences of the form (K t ,t G T m ) such 
that Kt C Y m for all t G T m . For example, assuming that V m is embedded in 
a Euclidean space, with norm denoted by || • ||, a space-time cylinder (e.g., 



G 




Fig. 3. Left: a band defined around a path. Right: corresponding noisy data. 

one used in disease outbreak detection [44]) is a cluster sequence (K t ,t € T m ) 
of the form Kt = {v S V m : \\v — xq\\ < tq} if t > to, and Kt = otherwise, 
so that to is the origin of the cluster in time and xq its center. Note that 
the radius remains constant here. Another example is that of a space-time 
cone, of the form Kt = {v £ Y m : \\v — xq\\ <C(t — to)} if t > to, and Kt = 
otherwise, so that (xo,to) is the origin of the cluster in space-time. The 
random variables {X v (t) :v £ Y m ,t G T m } are assumed to be independent. 
This spatio-temporal setting is a special case of the purely spatial setting 
with the set of nodes V m x T m . Understood as such, we are interested in 
testing versus H™ as before. 

1.3. Structured multiple hypothesis testing. Although the detection prob- 
lem formulated above seems of great practical relevance, the statistics lit- 
erature is almost silent on the subject, with the notable exception of the 
closely related topics of change-point analysis [16] and sequential analysis 
[64]. Indeed, the former is a special case of the spatial setting with the one- 
dimensional lattice, while the latter is a special case of the spatio-temporal 




Fig. 4. Left: an arbitrary connected component. Right: corresponding noisy data. 
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setting where Y m has only one node. In our context, these two settings are 
actually equivalent. 

What is further puzzling is that a number of publications addressing the 
task of detection in sensor networks all assume overly simplistic models. For 
example, in [4, 49, 52, 55, 69, 73], the values at the sensors are assumed to all 
have the same distribution under the null and the alternative. That is, either 
all of the nodes are all right or they are all anomalous — in our notation, 
fc-m = {V m j. First, this is not a subtle statistical problem since, in such 
circumstances, it suffices to apply the optimal likelihood ratio test. Second, 
this assumption does not make sense in all of the applications described 
above, where the event to be detected is expected to only affect a small 
fraction of locations in the network. 

In stark contrast, in all of the applications described earlier, the set of al- 
ternatives is composite. Viewing each node as performing a test of hypothe- 
ses, which is common in the literature on sensor networks, our problem falls 
within the framework of multiple comparisons. Multiple hypothesis testing 
is a rich and active line of research which is receiving a considerable amount 
of attention within the statistical community at the moment; see [32] and 
references therein. The vast majority of the papers assume that the tests 
are independent of each other, which is clearly not the case here since, in 
general, the class contains clusters that intersect. This is particularly true in 
engineering applications, although this assumption is often made [23, 62, 63]. 

1.4. The scan statistic. We will focus on the test that rejects for large 
values of the following version of the scan statistic: 



The chosen normalization is such that each term in the maximization is 
standard normal under the null and allows us to compare clusters of dif- 
ferent sizes. It corresponds to the generalized likelihood ratio test in our 
context if Ak is independent of K £ fC m . The scan statistic was originally 
proposed in the context of cluster detection in point clouds [30]. This is the 
method of matched filters which is ubiquitous in problems of detection in 
a wide variety of fields, sometimes in the form of deformable templates in 
the engineering literature [38, 51] or their nonpar ametric equivalent, active 
contours or snakes [72]. Note that the scan statistic is the prevalent method 
in disease outbreak detection, with many variations [25, 45-47]. 

As advocated in [8], we will not use the scan statistic directly in most 
cases, but rather restrict the scanning to a subset of K m . More precisely, we 
will introduce, on subsets of nodes K, L C V m , the metric 





(1.3) 
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and will restrict the scanning to an e-net of tC m with respect to S, that is, 
a subset {Kj : j G J} C K. rn with the property that for each K £ /C m , there 
is a j G J such that 5(K,Kj) < e. We will elaborate on this approach in 
the Supplement. When J is minimal, we call the resulting statistic an e- 
scan statistic. The approximation precision e will be chosen appropriately, 
depending on the situation. 

We focus on e-scan statistics for two reasons. First, their performance 
is easier to analyze than that of the scan statistic itself; in fact, the main 
approach to analyzing the scan statistic, the chaining method of Dudley 
[26, 67], is via a properly chosen e-scan statistic. Second, some of the classes 
we consider are rather large and we believe that it would be computationally 
impractical to scan through all of the clusters in the class; furthermore, our 
results show that, from an asymptotic standpoint, no substantial improve- 
ment would be gained by using the full scan statistic. 

We also note that the tuning parameter e may be dispensed with if we 
scan over subsets of different sizes in a multiscale fashion and use a scale- 
dependent threshold. 

1.5. Existing theoretical results. The vast majority of the literature as- 
sumes that the set of nodes is embedded in some Euclidean space, that is, 
Y m C M d . This is the case when the nodes represent spatial locations, such 
as in most sensor networks. In this context, the cluster class JC m is often 
derived from a class of domains A in M. d , in the following way: 

(1.4) IC m = {K = AnY m :AGA}. 

Most of the literature assumes that the class A is parametric, exemplified by 
deformable templates, for which theoretical results are available, especially 
in the case of the square lattice [8, 13, 24, 40, 57, 71]. In particular, with a 
normal location model, the scan statistic performs well, in the sense that it 
is asymptotically minimax; this is shown in [8] in a slightly different context 
tailored to image processing applications. We also mention the recent work 
[33], which considers the detection of multiple clusters (intervals) of various 
amplitudes in the one-dimensional lattice. As for nonparametric classes of 
domains, [8] argues that the scan statistic is asymptotically minimax for the 
case of star-shaped clusters with smooth boundaries. 

When V m is endowed with a graph structure, [7] considers paths of a 
certain length. In this setting, the scan statistic is shown to be asymptotically 
minimax when the graph V m is a complete, regular tree and near-minimax 
for many other types of graphs, such as the d-dimensional lattice for d > 3. 
Addario-Berry et al. [1] considers the same general testing problem with a 
focus on cluster classes defined within the complete graph, such as cliques 
and spanning trees. Note that part of the material presented here appeared 
in [5]. 
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1.6. New theoretical results. We describe here in an informal way the 
results we obtain. 

In Section 2, we focus on situations where the vertex set V m is embedded 
in a Euclidean space and well spread out in a compact domain. Within this 
framework, we consider in Section 2.1 a geometric class of clusters obtained 
as in (1.4) with A a class of blobs that are mild deformations of the unit ball. 
The clusters obtained in this way are "thick," in the sense that they are not 
filamentary. See Figure 1. In particular, this class contains all the common 
parametric classes obtained from parametric shapes such as hyper rectangles 
and ellipsoids, as long as the shape is not too narrow. Note that the size, 
the (exact) shape and the spatial location of the anomalous cluster under 
the alternative is unknown. In Corollary 1, we show that (under specific 
conditions) Hq 1 and H™ are asymptotically inseparable if there is r\ m — > 
slowly enough such that, for all K £ /C m , 

Ax < (1 - 77 m )V21og(m/|iir|); 

and conversely, we show that a version of the scan statistic asymptotically 
separates H™ and H™ if there is rj m — > slowly enough such that, for all 
K £ IC m , 

Ak> (l + ?7m)\/21og(m/|K|). 

Note that the detection rate is the same as for the class of balls so that, 
perhaps surprisingly, scanning for the location (and not the shape) is what 
drives the minimax detection risk. 

In Section 2.2, we consider "thin" clusters, obtained as in (1.4) with A 
a class of "bands" around smooth curves, surfaces or higher-dimensional 
submanifolds. In particular, this class contains hyperrectangles and ellip- 
soids that are sufficiently thin; see Figure 2. It turns out that, contrary to 
what happens for thick clusters, scanning for the actual shape impacts the 
minimax detection risk and is, in fact, the main contributor for some non- 
parametric classes. The situation is mathematically more challenging, yet 
we are able to prove the following in Proposition 3. Consider the class of 
bands of thickness r m around C 2 curves of bounded curvature. Then (under 
specific conditions), H™ and H™ are asymptotically inseparable if 

A m rV 4 (logm) 3 / 2 ^0. 

In Theorem 2, we show that, in the same setting, some e m -scan statistic 
asymptotically separates H™ and H™ if 

±±m' m ^ w - 

Hence, some form of scan statistics achieves a detection rate within a factor 
of (logm) 3 / 2 from the minimax rate. 
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In Section 2.3, we consider the spatio-temporal setting. We first consider 
cluster sequences that admit a "thick" limit. Cellular automata, which have 
been used to model epidemics [2], satisfy this condition in some cases. In 
Proposition 5, we show that scanning over space-time cylinders, as done in 
disease outbreak detection, achieves the asymptotic minimax risk. We then 
consider cluster sequences with controlled space-time variations, which may 
be a relevant model for applications such as target tracking [50] . We consider 
a fairly general model in Proposition 7. 

In Section 3, we assume that Y m = {0, 1, . . . ,m 1//rf — l} d , with m l / d an 
integer, seen as a subgraph of the d-dimensional lattice. We first consider, in 
Section 3.1, bands around nearest-neighbor paths; see Figure 3. We extend 
the results obtained in [7] to paths. For example, consider bands of thickness 
h m around a path of length £ m , both powers of m. The bounds in Theorem 
3 imply that H™ and H™ are asymptotically inseparable if 

A m (WM- 1/2 (logm) 3 / 2 ^0. 

Conversely, Proposition 8 states that an e-scan statistic asymptotically sep- 
arates Hq 1 and if 

A m (£ m /h m )- 1/2 ->oo. 

Therefore, some form of scan statistic is again within a factor of (logm) 3 / 2 
from optimal. In Section 3.2, we consider arbitrary connected components, 
constraining only the size; see Figure 4. In Proposition 9, we obtain a sharp 
detection rate for clusters of very small size. 

1.7. Structure of the paper. We have just described the contents of Sec- 
tions 2 and 3. Section 4 is our discussion section. We extend the results 
obtained for the normal location model to any exponential family in Sec- 
tion 4.1. Other extensions are described in Section 4.2. We state some open 
problems in Section 4.5. In Section 4.4, we briefly discuss the challenge of 
computing the scan statistic. The technical arguments are gathered in the 
Supplement. 

1.8. Notation. For two sequences of real numbers (a m ) and (b m ), a m >c 
b m means that a m = 0(b m ) and b m = 0(a m ); a rn < b rn means that a m < 
(1 + o(l))b m . For a,b G R, we use a V b (resp., a Ab) to denote max(a, b) 
[resp., min(a, &)]. For a£l, let [a] be the integer part of a; [a\ = [a] if a 
is not an integer and [a] — 1 otherwise; and \a] = [a] + 1. For a set A, \A\ 
denotes its cardinality. Define logj.(x) = logx if x > e and = 1 otherwise. All 
the limits in the text are when m — > oo. Throughout the paper, we use C 
to denote a generic constant, independent of m, whose particular value may 
change with each appearance. We introduce additional notation in the text. 
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2. Clusters as geometric shapes in Euclidean space. We assume that the 
nodes are embedded in fi^ C K^, a compact set with nonempty interior. Let 
|| • || denote the corresponding Euclidean norm. For A C and x G Q^, let 
dist(x, A) = inf yg A||x — y\\ and for r > 0, define 

B(A, r) = {x G R d : dist(z, A) < r}. 

In particular, B(x,r) denotes the (open) Euclidean ball with center x and 
radius r. On occasion, we will add a subscript d to emphasize that this is a 
(i-dimensional ball. 

We consider a sequence (V m ) of finite subsets of il^, of size |V m | = m, 
that are evenly spread out, in the following sense: there is a constant C > 1, 
independent of m and a sequence r* m — > such that 

(2.1) C- 1 mr d < \B(x,r)nY m \ <Cmr d Vr G [r* m , 1], Vx G O d . 

In words, the number of nodes in any ball that is not too small is roughly 
proportional to its volume. For the regular lattice with m nodes in = 
[0, l] d , condition (2.1) is satisfied for r* t > yfdm~ l l d . This is the smallest 
possible order of magnitude; indeed, for some constant C > and r small 
enough, there is a set with more than Cr~ d disjoint balls with centers in Q,^, 
and, by (2.1), they are all nonempty if r > r,^, which forces r* n > Cm~ l l d . 
Another example of interest is that of V m obtained by sampling m points 
from the uniform distribution, or any other distribution with a density with 
respect to the Lebesgue measure on f^, bounded away from zero and infinity; 
in that case, (2.1) is satisfied with high probability for r* m > C(log(m) /m) l l d 
when C is large enough; for an extensive treatment of this situation, see [56], 
Chapter 4. 

2.1. Thick clusters. In this section, we consider clusters as in (1.4), where 
A is a class of bi-Lipschitz deformations of the unit d-dimensional ball. This 
includes the vast majority of all the parametric clusters considered in the 
literature, such as hyperrectangles and ellipsoids, as long as the shape is not 
too narrow. Note that a slightly less general situation is briefly mentioned 
in [8]. 

We start with a lower bound on the minimax detection rate for discrete 
balls of a given radius. 

Proposition 1. Consider \ m ->■ such that \ m > r* m and let K m be the 
class of all discrete balls of radius X m , that is, 

tc m = {K = b{x, \ m ) n v m : x g n d }. 

and are then asymptotically inseparable if 
A m < v/2(ilog(l/A m ) - rj m , 

where r\ m — > oo . 



12 E. ARIAS-CASTRO, E. J. CANDES AND A. DURAND 

We now consider a much larger class of clusters and show that, neverthe- 
less, a form of scan statistic achieves that same detection rate. For a function 
/ : A C MP — > M. d , its Lipschitz constant is defined as 

A/=sup n/w-/fa)ii 

For k > 1, let J-d,d( K ) be the subclass of bi-Lipschitz functions / : -B(0, 1) C 
K — > Qd such that A/A/-1 < K or, equivalently, 

(2 . 2) ,M <tM 

For a function / : ^4 — >• M rf , define 

K f = im(f)nY m , wa(f):={f(x):xeA}. 

Note that Ay is intimately related to the size of im(/) and therefore of Kf. 
Indeed, a simple application of (2.2) implies that, for any / S J~ dd (K), 

(2.3) B(f(Q), X f /K) C im(/) C B(f(0),X f ). 

This implies that sets of the form im(/), with / £ JF dd (^K)^ are "thick," in 
the sense that the smallest ball(s) containing im(/) and the largest ball(s) 
included in im(/) are of comparable sizes. 

Theorem 1. Consider X m — > such that \ m > r* t and define 

fc-m = {Kf : f G Jd,d(K), A/ > A m }. 

£ m -scan statistic with e m — >■ and e m (log(l/A fn )) 1 ^ 2 ^ — > oo i/ien asymp- 
totically separates H™ and H™ if 

A m > V2dlog(l/A 

) + " 

where rj m = e^ n y/2d\og(l/X m ). Moreover, if ' r* m ~^m~ x l d and 

fC m = {Kf:feJ : did (K)}, 

then an e m -scan statistic with e m — > and e m (logm) 1 /( 2 ^ — > oo asymptoti- 
cally separates and H™ i/ 

A m > \/21ogm + n m , 

where rj m = e 2 n y / 2log7Ti. 

Therefore, on a larger class of mild deformations of the unit ball, some 
form of scan statistic achieves essentially the same detection rate as for the 
class of balls stated in Proposition 1. 
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We note that the lower bound on A m is driven by the smaller clusters in 
the class and that the performance guarantee is subject to a proper choice 
of £ m . A simple fix for both issues is to combine the tests for different cluster 
sizes with an appropriate correction for multiple testing. We summarize the 
consequence of Proposition 1 and Theorem 1 with this observation in the 
following result, inspired by [71]. 

Corollary 1. Consider X m — > such that X m > r^ and define 

K m = {Kf : / G A m > A/ > r* m }. 

and are then asymptotically inseparable if, for all K G fC m , 

K K <^2\og{m/\K\)-ri m , 

where t] m — > oo. Conversely, let Ti be an e^-scan statistic for the subclass 
{K f G IC m :2~ e <X f < 2~ m } with ej£V(M) -> oo. There is a test based on 
{Ti '■£ > 0} that asymptotically separates and H" 1 if, for all K G /C m , 

A^> y/2\og{m/\K\)+r] K , 

where t\k = ef y2\o^jn/\K\) and £k = log(m/\K\). 

The same procedure, that is, combining e-scan statistics at different (dyadic) 
scales, may be implemented in any of the settings we consider in this paper 
to obtain a test that does not depend on a tuning parameter like e m and 
achieves the same optimal rate at every size. This is simply due to the fact 
that we only need to consider the order of log m scales and the fast decaying 
tails of the scan statistics under the null. 

Union of thick clusters. In a number of situations, the signal to be de- 
tected may be composed of several clusters. Our results extend readily to 
this case. Let j m be a positive integer and consider sets of the form (Jj=i Kfj > 
where the union is over some fj G Td : d(n) such that, for j,f, Xf j < CXf., 
and 

||/ J (0)-/ / (0)||<C(A / .VA / .,), 

so that the sets im(/j) and im(/j/) are of comparable sizes and not too far 
from each other. In that case, Theorem 1 applies unchanged, as long as 
the number of clusters is not too large, specifically if j m = o(log(l/A m )) 1 / d . 
(This can be improved if the Kf. 's do not overlap too much.) If the proximity 
constraint is dropped, then the term log(l/A m ) in Theorem 1 is replaced by 
j m log(l/A m ). 
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2.2. Thin clusters. In this section, we consider clusters that are built 
from smooth embeddings in of the unit p-dimensional ball, where p < d. 
The special case of curves (p = 1) is, for example, relevant in road tracking 
[29] and in modeling blood vessels in medical imaging [36]. As in the previous 
section, the results we obtain below are valid for (some) unions of such 
subsets and, in particular, for submanifolds with a wide array of topologies. 

For a differentiable function / between two Euclidean spaces, let Df 
denote its Jacobian matrix. For k > 1, let J-p t d{^) be the class of twice 
differentiable, one-to-one functions / : B(0, 1) C MP — > im(/) C £ld satisfying 
XfXf-i < k and Xrjj < kA/. We consider clusters that are tubular regions 
around the range of functions in JF p d{ K )- F° r a function / with values in M. d 
and r > 0, define 

% = B(im(/),r)nV m . 

Again, Aj is intimately related to the size of B(im(/),r) and Kf r . This rela- 
tionship is made explicit in the Supplement. We consider classes of clusters 
of the form {Kf, r ■ f G J 7 }, where J 7 is a subclass of J r P! d(K>)- 

We start with a result on the performance of the scan statistic. For a 
class J- of functions with values in M. d and for e > 0, let N^T) denote its 
e-covering number for the sup-norm, that is, 

N e (T) =min-|n:3/i,. . .,/ n eJ, s.t. maxmin||/ - fjW^ <e\. 



Theorem 2. Let C be the constant defined in Lemma B.2 in the Supplement. 
Consider X m ,r m — > such that C~ 1 X rn >r m >r^ and let F be a subclass of 
Fp^K). Define 

K, m = {K f: r :/G J,A/> X m , C~ x X m >r> r m }. 

1/2 

An e m -scan statistic with e m = ofVm ) then asymptotically separates and 
if 

Am > (1 + e 2 m )^2logN £?n (T) +2dlog(l/A m ). 

Just as in Theorem 1, if r* m x m~ l l d , we can dispense with the restriction 
r m > r m an( i replace the factor log(l/A^J by logm in the bound. 

For a typical parametric class F, log N £ (F) ~ a(F) log(l/e), so the scan 
statistic (over an appropriate net) is accurate if 

(2.4) A m > (1 + r m ) v / 2a(^)log(l/r m ) + 2dlog(l/A m ). 
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On the other hand, log N £ (J-) x (l/e) a ^ for a typical nonparametric class 
J- [42], so the scan statistic (over an appropriate net) is accurate if 

(2.5) k m rt T)l2 -> oo. 

Finding a sharp lower bound for the minimax detection rate is more chal- 
lenging for thin clusters compared to thick clusters. By considering disjoint 
tubes around p-dimensional hyperrectangles, we obtain a lower bound that 
matches, in order of magnitude, the rate achieved by the scan statistic when 
the class T is parametric, displayed in (2.4). 

Proposition 2. Consider X m , r m — > with X m > r m > r m . Let U :W -» 
M. d be the canonical embedding so that Ux = (x,0) and let 

T = {/ : B(0, 1)C1 P 4 Q d , f(x) = X rn Ux + b, where b G R d }. 

Define 

£m = {K f , rm :feT}. 
HJJ 1 and H^™ are then asymptotically inseparable if 

Am < V2(d - p) log(l/r m ) + 2plog(l/A 
where r\ m — > oo . 

The proof is parallel to that of Proposition 1 and is therefore omitted. 

For at least one family of nonparametric curves (p = 1), we show that the 
rate displayed at (2.5) matches the minimax rate, except for a logarithmic 
factor. For concreteness, we assume that Qd = [0, Let H(a,K) be the 
Holder class of functions g : [0, 1] — > [0, 1] satisfying 

\g {s) (x)\<K VxG [0,l],Va<a; 

| 5 (W)( x )_ 5 (H)( y )| < K | x _ y |«-L«J vx,ye [0,1]. 



Proposition 3. Let r m —> with r m > r* m . Let T be the class of func- 
tions of the form f(x) = (x, gi(x), . . . ,gd-i(x)), where gj £ T-L(a,K), with 
a > 2. Define 

Km = {K f , rm :feT}. 
HJJ 1 and H™ are then asymptotically inseparable if 

A m rU {2a) (logmf/ 2 ^0. 

Thus, for the detection of curves with Holder regularity, a scan statis- 
tic achieves the minimax rate within a poly-logarithmic factor. We prove 
Proposition 3 by reducing the problem of detecting a band in a graph so 
that we can use results from Section 3.1. We do not know how to generalize 
this approach to higher-dimensional surfaces (i.e., p > 2). 
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2.3. The spatio-temporal setting. In this section, we consider the spatio- 
temporal setting described in Section 1.2.2. This is a special case of the 
spatial setting we have considered thus far, with time playing the role of an 
additional dimension. For their relevance in applications and concreteness 
of exposition, we focus on two specific models. In Section 2.3.1, we consider 
cluster sequences with a limit; as we shall see, this assumption is implicit 
in some popular models for epidemics. In Section 2.3.2, we consider cluster 
sequences of bounded variations. 

In the remainder of this section, we assume, for concreteness, that T m = 
{0, 1, . . . ,t m } with t m — > oo. Our results apply without any changes if the 
set of nodes varies with time, that is, with index set of the form nteT m ^m> 
in the case where each satisfies (2.1) with C and independent of t. 

2.3.1. Cluster sequences with a limit. We focus here on cluster sequences 
obeying Kt m ^ 0, that is, the anomalous cluster is present at the last time 
point. This is a standing assumption in syndromic surveillance systems [44]. 
To illustrate the difference, consider a typical change-point problem setting, 
where V m contains only one node and, for simplicity, assume that Ak is 
independent of K and that A m denotes this common value. First, let the 
cluster be any discrete interval (in time), so the signal may not be present at 
time t = t m . This is a special case of Section 2.1, with time playing the role of 
a spatial dimension (d = 1); we saw in Corollary 1 that the detection thresh- 
old is at A m ~ ^2 log |T m |. Now, let the emerging cluster be any discrete 
interval that includes t = t m . Detecting such an interval is actually much 
easier since we do not need to search where the interval is located, which is 
what drives the detection threshold for the thick clusters in Section 2.1 — 
we need only determine its length. Specifically, the scan statistic over the 
dyadic intervals containing t = t m asymptotically separates the hypotheses 
if A m X ^/bglogjT^J. 

Regarding the actual evolution of the cluster in time, a number of growth 
models have been suggested, for example, cellular automata [37, 61] and their 
random equivalent, threshold growth automata [12, 31], which have been 
used to model epidemics [2] . The latter includes the well-known Richardson 
model [59]. Under some conditions, these models develop an asymptotic 
shape (with probability one), a convex polygon in the case of threshold 
growth automata. Less relevant for modeling epidemics, internal diffusion 
limited aggregation is another growth model with a limiting shape [48]. 

The simplest cluster sequences with limiting shape are space-time cylin- 
ders, for which we have the equivalent of Proposition 1. (The proof is com- 
pletely parallel and we omit details.) 

Proposition 4. Consider X m —> with X m > and let JC m be the class 
of all space-time cylinders of the form K t = B(x, \ m ) D V m , Vi = 0, . . . , t m , 
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where x € Qd- Hg 1 and are then asymptotically inseparable if 

A m < V2dlog(l/A 

where r\ m — > oo . 

With only one possible shape and known starting point, such a model 
is rather uninteresting. We now consider a much larger class of cluster se- 
quences with some sort of limit [in the sense of (2.6)] and show that, nev- 
ertheless, a form of scan statistic achieves that same detection rate. For a 
cluster sequence K = (K t ,t £ T m ), let tx = mm{t:K t ^ 0}, which is the 
time when K originates. The following is the equivalent of Theorem 1. The 
metric 5 appearing below is defined in (1.3). 

Proposition 5. Consider sequences X m — > with X m > r* m and log log t m = 
o(log(l/A m )) ; and a function v(t) with lim^oo v{t) = and v{t) < 1 for all 
t>0. Let K, m be a class of cluster sequences such that t m — m&x{tK '■ K G 
/C m } — > oo and, for each K = (K t ,t G T m ) £ K. m , there exists f £ J^d^d^) 
with Xf > X m such that 

(2.6) S(K t , im(/) n V m ) < u(t - t K ) V* G T m . 

There is then a scan statistic over a family of space-time cylinders that 
asymptotically separates and H^ 1 if 

A m >(l+U)v / 2o!log(l/A m ), 
where £ m — > slowly enough. 

If the starting time is uniformly bounded away from t m and the conver- 
gence to the thick spatial cluster [in the sense of (2.6)] occurs at a uniform 
speed, then all of the cluster sequences in the class have sufficient time to 
develop into their "limiting" shapes. The space-time cylinders over which 
we scan are based on an e-net for the possible limiting shapes, that is, the 
class of thick clusters. 

Scanning over space-time cylinders (with balls as bases) is advocated in 
the disease outbreak detection literature [44]. Although seemingly naive, 
this approach achieves, in our asymptotic setting, the minimax detection 
rate if the cluster sequences develop into balls and, in general, falls short by 
a constant factor. 

We mention that the equivalent of Corollary 1 holds here as well, in that 
we can combine the different scans at different space-time scales to obtain a 
test that does not depend on a tuning parameter (implicit here) and which 
achieves the same rate for the cluster class defined as above, but with X m > 
Xf >r^, which is the class that appears in Corollary 1 . 
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2.3.2. Cluster sequences of bounded variation. In target tracking [11, 50], 
the target is usually assumed to be limited in its movements due to maximum 
speed and maneuverability. With this example in mind, we consider classes 
of cluster sequences of bounded variation, meaning that the cluster is limited 
in the amount it can change in a given period of time. As the rates we obtain 
in this subsection are the same with or without the condition Kf ^ 0, we 
do not make that assumption. Let t~^ = max{i : Kt 7^ 0}. 

We consider space-time tubes around Holder space-time curves. For a € 
(0, 1] and k > 0, let Hoo(a,K) be the Holder class of functions g: [0,oo) — > 
[0, 1] satisfying 

(2.7) \g{x) - g(y)\ < k\x - y\ a Vx, y G [0, 00). 
The following is the equivalent of Proposition 3. 

Proposition 6. Assume that Qd = [0, l] d . Consider sequences r m — > 
with r m > 2r^ and £ m such that 1 < £ m < t m . Let K, m be the class of all 
cluster sequences K g of the form K g j = B(g(t/^ m ),r m ) n V m for all t = 
tx, ■ ■ ■ ,t](, for some g = (gi, . . . , g^) with gj G 1~Loo(o>, k). Then, H™ and H™ 
are asymptotically inseparable if 

A m (t m /\UrU a ]r 1/2 ^g(t n J\UrU a ])(log(U) + loglog(t m )) 1/2 0. 

Conversely, an e-scan statistic with e < V2 asymptotically separates H™ and 
Hf if 

A m ((tm/\UrU a ] ) logttCV- 1 /") + logm)" 1 / 2 -> OO. 

For simplicity, is a power of m. If Crni~ll a = 0(1), then the 

detection threshold is roughly of order tU, 2 , while if Cm^m" is large, yet small 
enough that t m / (^ rn rll a ) is still a power of m, then the detection threshold 

is roughly of order (t m /(£ m rm Q )) 1/2 . 

A form of scan statistic is actually able to attain the same detection rate 
when the radius is unknown, but restricted to r > r m . In fact, another form 
of scan statistic achieves a slightly different rate over a much larger class of 
cluster sequences with bounded variations. Let S(r,K) be the set of subsets 
Scfirf such that B(x, r) C S C B(x, nr) for some x £ f^. 

Proposition 7. Consider a sequence £ m such that 1 ^ £ m <J t m and a 
constant r] > 0. Define K, m as the class of cluster sequences K such that, 
for each t = tx,. . . , K t = St Pi Y rn , where St G S(r t , k) for some r t > r* m , 
and, for any s,t = tx, ■ • • , 

(2.8) 5{K t ,K s )<r, if\t-s\<U- 
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Then, for n small enough, an e-scan statistic with e < y/2 asymptotically 
separates and H™ if 

Am((W6n)(logm) +logt m ) -»00. 

Consider the condition 

(2.9) 5{K t ,K s )< v {\t-s\/U) Vs,te{t K ,...,t+}, 

for a function i^: [0,oo) — > [0, \/2]. Then, (2.8) is satisfied with n = v(l) and 
the same £ m . The requirement in Proposition 7 is that be small enough. 
In particular, the cluster sequences considered in Proposition 6 satisfy, for 
some constant C > 0, 

5(K U K S ) < Cr~^ 2 (r* m V (\t - s\/^ m ) a f 2 Vs, t E {** , . . .,*+}. 

This comes from Lemma C.l in the Supplement and (2.7). Therefore, as- 
suming £ m <C (^m) _1 ^ Q 5 (2-9) is satisfied with v(u) = u a l 2 and £ m replaced 
m r rn ■ In that case, the detection rates obtained by the scan statistics 
of Propositions 6 and 7 are of comparable orders of magnitude. 

3. Clusters as connected components in a graph. In this section, we 
model the network with the d-dimensional square lattice; specifically, we 
assume that m 1 ^ is an integer (for convenience) and consider Y m = {0,1,..., 
m 1 / d — seen as a subgraph of the usual <i-dimensional lattice. We assume 
that d > 2 since the case where d = 1 is treated in Section 2.1. We work with 
the ^ 1 -norm, which corresponds to the shortest-path distance in the graph; 
let B(v,h) denote the corresponding open ball with center v and radius 
h so that B(v,h) = {v} for h £ (0,1], and, for a subset of nodes V, let 
B(V,h)={J veV B(v,h). 

3.1. Paths and bands. A nearest-neighbor band of length t and width h 
is of the form B(V, h), where V = (t>o, • • • , vi) forms a path in Z rf . A band 
with unit width (h = 1) is just a path. 

We say that a path (vq, ...,v#) in Z rf is nondecreasing if, for all t = 1, . . . 
vt — vt-i has exactly one coordinate equal to 1 and all other coordinates equal 
to 0. The case of paths was treated in detail in [7]; it corresponds to taking 
h m = 1 below. 

Theorem 3. Suppose that d > 2 and let }C m be the class of bands of 
width h m generated by nondecreasing paths in V m of length i m , starting 
at the origin, with m l / d >£ m > h m . Then, H™ and H^ 1 are asymptotically 
inseparable if 

A m (W/i m )- 1/2 log t (C)(log/i m + log t logC) 1/2 ^0 ford = 2, 
A m (^ m //i m )" 1/2 (log t /i m )(log t log/i m ) -)-0 for d>3. 
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Conversely, an e-scan statistic with e < V2 fixed asymptotically separates 
Hg 1 and if 

A m (W/i m )" 1/2 ^oo. 

For the case of nondecreasing paths, a form of the scan statistic achieves 
the minimax rate in dimension d > 3, while it falls short by a logarithmic 
factor in dimension d = 2. In the latter setting, Arias-Castro et al. [7] intro- 
duces a test that asymptotically separates H™ and if 

(3.1) A m (W^r 1/2 log t (4n) 1/2 ^oo, 

coming slightly closer to the minimax rate. 

In fact, even when the band has unknown length, width and starting 
location, and when the path is not restricted to be nondecreasing, a form of 
scan statistic achieves the same rate, except for a logarithmic factor. 

Proposition 8. Suppose that d>2 and let /C m be the class of all bands 
of width h and length t, where i m > t > h > h m , that are within Y m and 
generated by paths that do not self-intersect. An e-scan statistic, with e < \/2, 
then asymptotically separates H™ and H™ if 

A. m (^m/hm + log(m//l^) + log f log^ m )" 1/2 -> OO. 

3.2. Arbitrary connected components. We consider here classes of con- 
nected components with a constraint on their sizes. Arbitrary connected 
components in the square lattice are sometimes called animals or polyomi- 
noes (polycubes in dimension d > 3), which are well-studied objects in com- 
binatorics, where the goal is to count the number of polyominoes [41]. We 
mention in passing the results in [22] which provide a law of large numbers 
for the scan statistic under the null. Otherwise, such objects are fairly new 
to statistics. Detecting animals is, of course, harder than detecting paths 
since paths are themselves animals. The result below offers a sharp detec- 
tion threshold for connected components of sufficiently small size. 

Proposition 9. Let JC m be the class of animals of size k m = o(m) 
within V m . and H™ are then asymptotically inseparable if 

A m < a/2 log m - r] m , 

where rj m — > oo. Conversely, let JC m be the class of animals of size not exceed- 
ing =o(logm) within V m . The actual scan statistic then asymptotically 
separates HJJ 1 and H™ if 



A m > a/ 2 log m. 
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Note that, in general, we can obtain a quick (naive) upper bound on the 
detection rate for large clusters by considering the simple test that rejects 
for large values of ^2 v gy X v (this is the "average test" in [1]). This test 

asymptotically separates H™ and H m if k m km 2 m~ 1 / 2 — > oo, assuming the 
clusters in K, m are of size bounded below by k m . An open question of the- 
oretical interest is whether, for the class of animals of size k m = yjm in the 
two-dimensional lattice, there is a test that asymptotically separates H™ 

— 1/2 

and when A m k m —> slowly enough. In dimension three or higher, 
Theorem 3 implies that this is not possible, even for paths. 

4. Discussion. 

4.1. Extension to exponential families. Although the previous results 
were stated for the normal location model, they extend to any one-parameter 
exponential model if the anomalous clusters are large enough. For exam- 
ple, consider a Bernoulli model where the variables are Bernoulli with pa- 
rameter 1/2 under the null and with parameter pk > 1/2 when they be- 
long to the anomalous cluster K; or, a Poisson model where the variables 
are Poisson with mean 1 under the null and fiK > 1 when they belong 
to the anomalous cluster K. In general, transforming the variables and/or 
the parameter if necessary, we may assume that the model is of the form 
Fq, with density fe(x) = ex.p(9x — logcp(6)) with respect to Fq, where, by 
definition, <p(0) = Eo[exp(#X)], where Eo denotes the expectation under 
Fq. We always assume that (p(9) < oo for 9 in a neighborhood of 0. Let 
a 2 = Varo(X), the variance of X ~ Fq. In the Bernoulli model, the corre- 
spondence is 9 = log(p/(l —p)) and a 2 = 1/4; in the Poisson model, 9 = logA 
and a 2 = 1. Under the null hypothesis, all of the variables at the nodes have 
distribution Fq, that is, 

r^-.Xv-Fq Wev m . 

Under the alternative, the variables at the nodes belonging to the anomalous 
cluster K G K, m have distribution Fq k with 9k '■= ctKk\K\~ 1 / 2 , that is, 

HT tK :X v ~Fe K Vv£K; X v ~ F W £ K. 

As before, the variables are assumed to be independent. 

If the clusters in the class are sufficient large, then the results presented for 
the normal location family hold unchanged. Intuitively, large enough clusters 
allow for the sums over them to be approximately normally distributed. 
Details are provided in the Supplement. For example, we have the following 
equivalent of Corollary 1 in the context of thick clusters as in Section 2.1. 
Consider A m > r m > with mr m (log l/r m )~ 3 — > oo (which guarantees that 
the clusters in the class are large enough) and define the class 

fC m = {Kf.fe Jd,d(«), A m > A/ > r m }. 
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In this setting, under the Bernoulli model, the detection threshold is at 



Note that without a lower bound on the minimum size of the anomalous 
clusters, the general analysis breaks down and the results depend on the 
specific exponential model. For example, unless min{| K\ : K G IC m } — > oo fast 
enough, detection is impossible in the Bernoulli model, even if the anomalous 
nodes have value 1 under the alternative. 

4.2. Other extensions. The array of possible models is as wide as the 
breadth of real-world applications. We mention a few possible variations 
below. 

Beyond exponential families. Using an exponential family of distribu- 
tions allows us to obtain sharp detection lower bounds. Otherwise, similar 
results, although not as sharp, may be obtained for essentially any family 
of distribution Fg, where the distance between the null = and an alter- 
native 9 is in terms of the chi-square distance between Fq and Fg; see [7], 
Section 5. 

Different means at the nodes. We could consider a situation where the 
mean varies over the nodes of the anomalous cluster. This situation is con- 
sidered in [33] for the case of intervals, and the constant in the detection 
rate is indeed different. We implicitly considered a worst case scenario where 
the mean is bounded below over the anomalous cluster and subsequently as- 
sumed it was equal to that lower bound everywhere over the anomalous 
cluster. However, our results hold unchanged if we allow X v to have any 
mean above 6k, for every v € K, K being the anomalous cluster. 

Dependencies. Also of interest is the case where the variables are depen- 
dent. In the spatial setting, the same paper [33] solves this problem for the 
case of the one-dimensional lattice, with the correlation between X v and 
X w decaying as a function of distance between v and w. We postulate that 
the same result holds in higher dimensions. In the spatio-temporal setting, 
variables could be dependent across time as well, involving a higher degree 
of sophistication. We plan on pursuing these generalizations in future pub- 
lications. 




under the Poisson model, the detection threshold is at 



VK = l + T-r^(2log(m/\K\)) 



Unknown variance or other parameters. We assumed throughout that 
the variance was known (and equal to 1 after normalization). This is, in 
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fact, a mild assumption, as one can consistently estimate the variance using 
a robust estimator, say the median absolute deviation (MAD), with the usual 
-y/m-convergence rate, assuming that the anomalous cluster corresponds to a 
small part of the entire network. When dealing with one-parameter families 
such as Bernoulli or Poisson, the issue is to estimate the parameter under the 
null and a robust version of the maximum likelihood (e.g., trimmed mean 
for these two examples) can be used for that purpose. 

4.3. Energy, bandwidth and other constraints. We assume throughout 
that a central processor has access to all the information measured at the 
nodes and, based on that, makes a decision as to whether there is an anoma- 
lous cluster of nodes in the network or not. This assumption is reasonable in, 
for example, the context of image processing or syndromic surveillance. How- 
ever, real- world sensor networks of the wireless type are often constrained by 
energy and/or bandwidth considerations. A growing body of literature [50] 
is dedicated to designing efficient (e.g., decentralized) communication pro- 
tocols for sensor networks under such constraints. As mentioned in Section 
1.3, the papers we are aware of consider very simplistic detection settings. 
In the context of the present paper, it would be interesting to study how the 
detection rates change when different communication protocols are used. 

We also assume that we have infinite computational power. However, all 
real-world systems operate under finite energy and processing resources. In 
the same way, it would be interesting to know what detection rates are 
achievable under such computational constraints. 

4.4. On computing the scan statistic. In all of the settings we consider in 
this paper, the scan statistic comes close to achieving the minimax detection 
rate. Turning to computational issues, however, it is very demanding, even 
when scanning for simple parametric clusters such as rectangles. For general 
shapes, Duczmal, Kulldorff and Huang [25] suggests a simulated annealing 
algorithm, which, from a theoretical point of view, is extremely difficult to 
analyze. For parametric shapes and blobs, Arias-Castro, Donoho and Huo 
[8] advocates the use of e m -scan statistics based on multiscale nets built out 
of unions of dyadic hypercubes; similar ideas appear in [71]. Partial results 
suggest that this approach yields, in theory, a near-optimal algorithm for 
detecting the more general thick clusters considered in Section 2.1. 

For the thin clusters of Section 2.2, or for the bands of Section 3.1, the 
situation is quite different. Take the latter. After pre-processing the data by 
performing a moving average with an appropriate radius, it remains to find 
the maximum over a restricted, yet exponentially large, set of paths. With- 
out further restriction, this problem, known as the "bank robber problem" 
or "reward budget problem" [21], is NP-hard. Note that DasGuptaet al. [21] 
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suggests a polynomial time approximation that deserves further investiga- 
tion. The case of thin clusters is even harder. In the context of point clouds, 
Arias-Castro, Efros and Levi [10] introduces multiscale nets that could be 
adapted to the setting of a network. It remains to compute the scan statistic 
over this net, which seems particularly challenging for surfaces of dimension 
p > 2, which no longer correspond to paths. In the spatio-temporal setting 
of Section 2.3, dynamic programming ideas could be used, as done in [9] in 
the context of point clouds and in [17] in the context of a harmonic analysis 
decomposition of chirps. 

4.5. Open theoretical problems. The paper leaves two main theoretical 
problems unresolved. The first one concerns obtaining sharper bounds for 
the detection of thin clusters. This is in the context of Section 2.2. For 
parametric classes, the challenge is to match constants in the rate, while, 
for nonpar ametric classes, the challenge is to obtain sharper lower bounds, 
perhaps closer to what a scan statistic is shown to achieve in Theorem 1. 
We were only able to do the latter for curves; see Proposition 3. 

The second one concerns comparing the detection rates for arbitrary con- 
nected components and for paths. At a given size, the thicker the band 
(relative to its length), the easier it is to detect it; see Theorem 3. It seems, 
therefore, that the most difficult connected components to detect are paths 
or unions of paths. But is this true? In other words, are the minimax detec- 
tion rates for arbitrary connected components and paths of a similar order 
of magnitude? 

Acknowledgments. The authors are grateful to the anonymous referees 
for suggesting an expansion of the discussion section, for encouraging them 
to obtain sharper bounds and for alerting them of the possibility of improv- 
ing on the performance of the scan statistic by using a different threshold 
for each scale, which resulted in Corollary 1. 

SUPPLEMENTARY MATERIAL 

Supplement: Technical Arguments (DOI: 10.1214/10-AOS839SUPP; .pdf). 
In the supplementary file [6], we prove the results stated here. It is di- 
vided into three sections. In the first section, we state and prove general 
lower bounds on the minimax rate and upper bounds on the detection rate 
achieved by an e-scan statistic. We do this for the normal location model 
first and extend these results to a general one-parameter exponential family. 
In the second section, we gather a number of results on volumes and node 
counts. In the third and last section, we prove the main results. 
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